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Abstract 

Background: Studies of online database(s) showed that convincing examples of eukaryote PPKs derived from 
bacteria type PPK1 and PPK2 enzymes are rare and currently confined to a few simple eukaryotes. These enzymes 
probably represent several separate horizontal transfer events. Retention of such sequences may be an advantage 
for tolerance to stresses such as desiccation or nutrient depletion for simple eukaryotes that lack more sophisticated 
adaptations available to multicellular organisms. We propose that the acquisition of encoding sequences for these 
enzymes by horizontal transfer enhanced the ability of early plants to colonise the land. The improved ability to 
sequester and release inorganic phosphate for carbon fixation by photosynthetic algae in the ocean may have 
accelerated or even triggered global glaciation events. There is some evidence for DNA sequences encoding PPKs 
in a wider range of eukaryotes, notably some invertebrates, though it is unclear that these represent functional 
genes. 

Polyphosphate (poly P) is found in all cells, carrying out a wide range of essential roles. Studied mainly in 
prokaryotes, the enzymes responsible for synthesis of poly P in eukaryotes (polyphosphate kinases PPKs) are not 
well understood. The best characterised enzyme from bacteria known to catalyse the formation of high molecular 
weight polyphosphate from ATP is PPKl which shows some structural similarity to phospholipase D. A second 
bacterial PPK (PPK2) resembles thymidylate kinase. Recent reports have suggested a widespread distribution of 
these bacteria type enzymes in eukaryotes. 

Results: On - line databases show evidence for the presence of genes encoding PPKl in only a limited number of 
eukaryotes. These include the photosynthetic eukaryotes Ostreococcus tauri, O. lucimarinus, Porphyra yezoensis, 
Cyanidloschyzon merolae and the moss Physcomitrella patens, as well as the amoeboid symbiont Capsaspora 
owczarzaki and the non-photosynthetic eukaryotes Dictyostelium (3 species), Polysphondylium pallidum and 
Thecamonas trahens. A second bacterial PPK (PPK2) is found in just two eukaryotes (0. tauri and the sea anemone 
Nematostella vectensis). There is some evidence for PPKl and PPK2 encoding sequences in other eukaryotes but 
some of these may be artefacts of bacterial contamination of gene libraries. 

Conclusions: Evidence for the possible origins of these eukaryote PPKls and PPK2s and potential prokaryote 
donors via horizontal gene transfer is presented. The selective advantage of acquiring and maintaining a prokaryote 
PPK in a eukaryote is proposed to enhance stress tolerance in a changing environment related to the capture and 
metabolism of inorganic phosphate compounds. Bacterial PPKs may also have enhanced the abilities of marine 
phytoplankton to sequester phosphate, hence accelerating global carbon fixation. 
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Background 

Recent reviews have proposed a widespread occurrence 
of horizontally transferred bacterial type polyphosphate 
kinase enzymes in eukaryotes [1]. Inorganic polyphos- 
phate (poly P) has been present since pre-biotic times 
[2] and has been proposed as an energy distributor in a 
pre- ATP world [3]. Poly P is found in organisms that 
represent species from each domain in nature: Eukarya, 
Archaea and Bacteria [4-6]. Studied mainly but not 
exclusively in prokaryotes, poly P and its associated 
enzymes are vital in diverse basic metabolism, in at least 
some structural functions and, notably, in stress re- 
sponses [7]. These numerous and unrelated roles for 
poly P are probably the consequence of its presence in 
life-forms from early in evolution [8]. The genomes of 
many bacterial species, including human pathogens, 
encode a homologue of a major poly P synthetic enzyme, 
polyphosphate kinase 1 (PPKl) based on a phospholip- 
ase D structure [9]. Loss of PPKl produces reduced poly 
P levels, and deletion of the ppkl gene in pathogens also re- 
sults in a loss of virulence towards protozoa and animals [7]. 
A second PPK activity in bacteria, PPK2, is related to 
thymidylate kinase [10,11]. PPK2 is distinguished from PPKl 
by its preference for utilising poly P in the reversible gener- 
ation of GTP. Polyphosphate-AMP-phosphotransferase 
(PAP) uses poly P to phosphorylate AMP to ADP [8]. Other 
enzymes that influence accumulation of poly P are the 
hydrolytic enzymes: exopolyphosphatase (PPX) that releases 
Pi from the ends of poly P [12] and endopolyPases (PPN) 
that cleave poly P to progressively shorter chains. These en- 
zymes together maintain poly P metabolism and catabolism 
in bacteria. Poly P metabolism is less well characterised in 
eukaryotic systems. In Dictyostelium discoideum a PPK activ- 
ity (DdPPK2) based on three actin like proteins has been 
documented [13]. Hothorn et al. 2009 [14] identified a PPK 
activity associated with a fourth class of enzyme, a vacuolar 
transport chaperone (VTC), which has a distribution largely 
limited to simple eukaryotes. Intriguingly, Reusch et al. 
(1997) [15] assigned a PPK activity to a Ca^* ATPase activity 
in humans. Recendy Lonetti et al. (2011) [16] demonstrated 
Saccharomyces cerevisiae to have undetectable levels of poly 
P in mutants unable to produce inositol pyrophosphates. 
Hence, a range of families of enzymes have been shown to 
have PPK activity. The enzymes responsible for poly P 
synthesis in most eul<aryotes remain unidentified with the 
exceptions of an actin related protein in Dictyostelium 
discoideum [13], vacuolar transport chaperones in Saccharo- 
myces cerevisiae [14] and a ATPase in Homo sapiens 
[15]. In this respect, the hypothesis that bacteria - like PPK 
enzymes based on phospholipase D [9] or thymidylate kinase 
[10,11] exist in eukaryotes, requires investigation. 

Horizontal gene transfer (HGT) is now recognised as 
an important force in eukaryote genome evolution [17]. 
Hooley et al. (2008) [18] summarised evidence that 



suggested a very limited distribution of bacterial type 
PPKl and PPK2 enzymes in a small number of eukary- 
otes. Rao et al. (2009) [1] have claimed evidence for a 
surprisingly wide potential distribution of the bacteria 
type PPKl and PPK2 in eukaryotes. Computer aided bio- 
informatics techniques can exploit genome project data- 
bases swiftly to summarise likely candidates for PPK 
activity. Similarity search tools such as BLAST [19] and 
multiple alignment programs like Clustal W/X, TCoffee 
and MUSCLE [20-22] allow rapid comparisons of se- 
quence data. Phylogenetic analyses [23,24] can infer evolu- 
tionary relationships between DNA or protein sequences. 
The current paper aims to examine the evidence for 
bacteria type polyphosphate kinases in eukaryotes and to 
consider their relationships to possible donor prokaryotes. 
The possible selective advantages in acquiring such 
prokaryote genes are discussed. 

Results 

Bacteria type PPKs on interpro 

An initial analysis of PPK's listed on Interpro was car- 
ried out to eliminate any sequences with weak support 
for their annotations. Table 1 reports annotations for 
PPKl and PPK2 for eukaryotes held under 4 different 
Interpro accession numbers. The annotations of three 
Populus trichocarpa (B9PBE1, BPDP9 and B9NJ30) 

Table 1 A summary of bacteria type polyphosphate 
kinase accessions for eukaryotes on Interpro (12/2/12) 



IPR003414 (PPKl) 



A2VBB6 


Porphyra yezoensis 


A4RQI1 


Ostreococcus lucimahnus 


A9U2N0 


Physcomitrella patens 


B9PBE1 


Populus trichocarpa 


Q01H21 


Ostreococcus tauri 


Q2MEV6 


Physcomitrella patens 


D3B5H9 


Polysphondyllum pallidum 


Q54BM7 


Dictyostelium discoideum 


E9CFK0 


Capsaspora owczarzaki 


F4PF87 


Batrachochytrium dendrobatidls 


IPR022488 (PPK2 related) 




A8DVD6 


Nematostella vectensis 


B9NJ30 


Populus trichocarpa 


B9PDP9 


Populus trichocarpa 


Q015Y3 


Ostreococcus tauri 


IPR022486 (PPK2, PA0141) 




Q015Y3 


Ostreococcus tauri 


IPR016898 (PPK2) 




Q015Y3 


Ostreococcus tauri 


A8DVD6 


Nematostella vectensis 
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accessions are questionable. The encoding sequence 
for B9PBE1 (PPKl) has previously been reported to 
show 100% identity to DNA from bacteria [18]. Simi- 
larly, BLASTn searches at NCBI reveal 98% and 100% 
identity over the entire coding regions of BPDP9 and 
B9NJ30 (both PPK2) respectively to Delftia and 
Cupriavidus bacteria. A PPKl (F4PF87) is listed on 
Interpro for the chytrid Batmchochytrium dendrobatidis 
strain JAM81 (http://genome.jgi-psforg/Batde5/Batde5. 
home.html). This accession lacks introns and is absent 
from a second strain JEL423 (http://v«vw.broadinstitute. 
org/annotation/genome/batrachochytriumdendrobatidis/ 
MultiHome.html). It is annotated only as a predicted 
protein on a short scaffold without any determined EST's. 
However, it shows only limited DNA similarity to bacterial 
genomes (best match 68% of 1728 bases identical to Bacil- 
lus cytotoxicus), which may explain why the sequence was 
not annotated out of the current draft of the genome 
sequence. These doubtful annotations were excluded from 
further analysis. 

Extensive searches of a number of other databases re- 
vealed only a small number of other PPKl enzymes and 
no further PPK2 matches. Even using a high automatic 
BLASTp search threshold of e < 10 or 1, only compelling 
matches with tiny (le' or less) e-values were generated 
for visual confirmation. On NCBI, 73 fungi, 40 protozoa, 
27 insects and 6 nematodes gave no additional hits with 
the exception of a Dictyostelium fasciculatum PPKl 
(Genbank: EGG21828.1). No additional PPKs were 
found in 25 species of Viridoplantae, two species of dia- 
toms {Thalassiosim pseudonana and Phaeodactylium 
tricornutum), the haptophyte Emiliana huxleyi, or the 
lycophyte Selaginella huxleyi. Additional and convincing 
matches to PPKl were found in the amoeboid symbiont 
Capsaspora owczarzaki and the protistan Thecamonas 
trahens. The red algal species Cyanidioschyzon merolae 
and Porphym yezoemis both show convincing evidence 
of PPKl enzymes of bacterial origin. The P. yezoemis 
PPKl sequence is 913 amino acids in length but is in- 
complete at the N terminal as it is based on an incom- 
plete mRNA sequence. 

Additional file 1 shows complete alignments of each 
of the eukaryote PPKl enzymes compared with 
prokaryote and archaea controls. Within this, it can be 
seen that the C. owczarzaki protein shows distinct, 
unique inserts causing non-alignment. Such large and 
numerous inserts may have a disproportionate affect 
on phylogenetic analysis; hence this sequence was ex- 
cluded from further analysis. Eukaryote PPKl enzymes 
are characterised by extensive N terminals making 
them longer than the bacterial counterparts [18]. In 
addition a PPKl was identified in the annotated but 
incomplete genome of Ostreococcus sp.RCC 809 (not 
included in analysis - http://www.jgi.doe.gov). 



Table 2 summarises the intron density of these eukaryote 
ppk genes. The recently completed Dictyostelium purpureum 
genome (http://genome.jgi-ps£org/Dicpul/Dicpul.home.html) 
was searched using BLASTp to reveal an ortholog of 
the D. discoideum PPKl protein (Dicpul_45674) with 
a single intron [25]. Two ppkl type genes are anno- 
tated in Physcomitrella patens. They show around 84% 
nucleotide identity with each other, presumably reflecting 
a common origin and/or duplication of a single sequence. 
These two genes encode proteins that differ in length by 
71 amino acids with the key differences appearing in the 
extended N terminal region [18]. Remarkably 21 and 22 
introns are annotated in these two ppkl genes (Table 3; 
http://genome.jgi-psforg/Phypal_l/Phypal_l.home.html). 
Capsaspora owczarzald ppkl (CAOG_06840T0) and 
Thecamonas trahens ppkl (AMSG_1 1662.2) both show 
three introns. No introns are annotated in the ppk genes 
from the two Ostreococcus species (http://genome.jgi-ps£ 
org/ Ostta4/ Ostta4.home.html; http:// genome.jgi-psf org/ 
Ost9901_3/Ost9901_3.info.html) or the C. merolae ppkl 
gene. Figure 1 illustrates the identities between the two 
eukaryote PPK2 enzymes and a model prokaryote. Exten- 
sive conservation is revealed throughout the bulk of the 
sequences with some increased variability observed 
particularly at the N terminus. 

Phylogenies of eukaryote bacteria type PPKl and PPK2 

The TreeDyn analysis of PPKl is shown in Figure 2. This 
shows that the eukaryotic sequences group together con- 
sistent with their taxonomic groupings. The bootstrap- 
ping numbers indicate that these groupings can be relied 
upon with a high degree of confidence. The most closely 
related bacterial species to all the eukaryotic PPKl se- 
quences consistently came from the cyanobacteria group 

Table 2 Summary of intron density in eukaryote ppfc7 
and ppkl genes (number per gene) 



Gene Species 







21, 22 


Physcomitrella patens 


0 


Ostreococcus luclmarinus 


0 


0. tauri 


0 


Dictyostelium discoideum 


1 


D. purpureum 


4 


Polysphondylium pallidum 


3 


Capsaspora owczarzaki 


0 


Batractioctiythum dendrobatidis 


3 


Thecamonas trahens 


ppkl 




0 


Nematostella vectensis 


0 


0. tauri 
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Table 3 Characteristics of introns of Physcomitrella patens for genes encoding PPKIs (Interpro accession codes) 
compared to means for genome [45] 



PPK1 


Introns 


Mean intron length 
(bp +/- S.E.) 


Total intron length 
per gene (bp) 


Total exon length 
per gene (bp) 


Q2MEV6 (973 amino acids) 


21 


151 +/- 10 


3160 


29 


A9UZN0 (902 amino acids) 


22 


168 +/- 10 


3688 


2709 


P. patens mean 


5.7 +/- 0.5 


262 +/- 1 2 


1 330 +/- 1 1 3 


1448 +/- 78 



(Figure 2). When additional top matching (e-value = 0 
from BLASTp searches) cyanobacteria are included in 
the analysis this strengthens the association of this group 
of bacteria to the eukaryotic PPKs (Additional file 2). 
When additional bottom matching cyanobacteria (but 
still with very low e-values of approximately le'^*") are 
included in the analysis no such association is seen 
(Additional file 3), indicating that the eukaryotic PPKIs 
share an origin with a subset of the cyanobacteria. When 
the analysis is repeated without the eukaryotic specific N 
extension the support for the association with Cyanothece 
is increased. Figure 3 shows the TreeDyn view of PPK2, 
which indicates three distinct groupings which are not ne- 
cessarily associated with the taxonomic classifications. 
The results show that the eukaryotes consistently cluster 
with separate groups of bacterial PPK2s. 

Invertebrates 

Blastp searches for E. coli PPKl and P. aeruginosa PPK2 
produced no significant similarities against 28 completed 
arthropod genomes. tBlastn searches gave four matches 
to each sequence which are summarised in Table 4. The 
accessions for each of the Aedes aegypti matches de- 
scribed the sequences as of "probable bacterial origin" 
and none of them matched verified transcripts. DNA se- 
quences for the remaining four sequences were searched 
against the entire NCBI database via Blastn. In each case, 
excellent matches to bacterial DNA sequences were found 
with no other matching arthropod DNA. ABLF02002165.1 
gave 79% identity over 1405 bases (e = 0) to a ppkl from 
Staphylococcus saprophyticus. EST/transcript searches on 
AphidBase revealed only two short matches (28 out of 32 
bases at e = 0.055). ACJG01018676.1 gave 69% identity 
over 644 bases (e = 2 e'^'') to a ppkl from Conexibacter 
woesei. In this case, EST data on the Daphnia JGI site sup- 
ported a single, intronless 807 bp long gene (e = 0). 
ABJB010687643.1 gave 97% identity over 1164 bases 
(e = 0) to a Pseudomonas fluorescens polyphosphate 
AMP phosphotransferase gene (ppk2). Only short 
EST/transcript matches to just 23 bases (e value 0.51) 
could be found on the Ixodes database for this se- 
quence. Finally, ABJB010847895.1 gave 71% identity 
over 736 bases (e = 1 e'^^) to a short chain dehydrogen- 
ase/reductase from Burkholderia phymatum and one per- 
fect transcript match to all 762 bases (e = 0) on the Ixodes 



database. This sequence matched to an intronless gene 
(ISCW024221) described as a reductase. 

Discussion 

Nearly 200 eukaryote genomes have been examined in 
the present work for evidence of bacteria type PPKl and 
PPK2. No single database contains a definitive list of 
eukaryote bacteria type PPKs but we can conclude that 
relatively few eukaryotes possess these enzymes (Table 1, 
Additional file 1). Hooley et al. (2008) [18] reported 
extensive conservation of structure between bacterial 
PPKIs and their eukaryote counterparts. Here we dem- 
onstrate a similar degree of conservation between the 
two eukaryote examples of PPK2 and bacterial counter- 
parts (Figure 1). There is therefore a taxonomically dis- 
continuous distribution of a limited number of bacterial 
type PPKs in diverse simple eukaryotes. The most parsi- 
monious explanation for such eukaryote ppk genes, sev- 
eral of which contain no introns suggesting prokaryote 
origins, is a number of independent horizontal transfer 
events from bacteria. 

At the outset, it was important to eliminate possible 
artefacts from the analysis. There are several clear exam- 
ples of likely incorrect identifications of PPK encoding 
genes. For example the 100% DNA sequence identity of 
the proposed Populus ppkl and ppk2 with Ralstonia/ 
Delftia seems an obvious case of bacterial contamin- 
ation. Several hits to bacteria - type PPKIs have been 
claimed for insects [1,26]. These exciting suggestions 
must be examined in the light of the possible contamin- 
ation of gene banks with bacterial sequences [27]. Con- 
versely it is possible that vector search programs may 
erroneously eliminate real examples of horizontal 
transfers. It is important to consider that DNA sequen- 
cing and annotation errors may give misleading gene 
descriptions [28]. Table 4 summarises a relatively small 
number of arthropod matches that identify potential 
PPKl and PPK2 encoding sequences. Of these, four 
have clearly been identified as likely bacterial con- 
taminants by the original workers. Just two of the 
remaining four have some EST support for their pres- 
ence as genuine eukaryote genes. Their lack of similar- 
ity to any other arthropod DNA sequences and high 
DNA identity to bacteria still makes their annotation 
questionable. 
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AAN87337 

A8DVD6 

Q015Y3 



EEPTVSPPSPEQPAAQPAKPARPAARRAPRKPATRRPRVASPAQKAREE 51 

^IDTSI L KVTK 13 

^PRGTHARAQREDS SGTIRGRSADEQIVFFEE 33 



AAN87337 IQAISQKPVALQVASAPHGSSEDSTSASLPANYPYHTRMRRNEYEKAKHDL 102 

A8DVD6 SIELKNIPTLIDW TSEKKKEKELQKT 40 

Q015Y3 MDHAKCVRKAVRKA IVNAEYEAILERL 6j 



Walker A 



AAN87337 

A8DVD6 

Q015Y3 

cons 



AAN87337 

A8DVD6 

Q015Y3 



QlEttJ^fiSllWitETGQ-kVVVLJ.'tlijkDAAGKyGTIKRFMEHLNt'KtiAMil^A 1«| 
REKLSKLQDKMYAHNKYGVLICLQGMDTAGKDSLIREVFKEFNARGVWHS 9^ 
QFELVTWQDYVKARGE-KWILFEGRDAAGKGGCISRITDVLSARQCKWA 1^ 



Walker B 

EfeKPSSQEQGQWYFQRYIQHLPTAGEMVFFDRSWYNRAGVERVMGF 198 

FKTPNSTELEHDYLWRHYLALPEKGKFSVFNRTHYENVLVTRVHPNYILNE 142 

mmmmmmmmgmmimimmgigmgimmmgmmm^^ 156 



cons 



AAN87337 

A8DVD6 

Q015Y3 



«■ Lid 

CSPLQYLEFMRQAPELERMLTNSGILLFKYWFSVSREEQLR iW 

NIPSINSVEDITPAFWENRFESINAFEKHITSNGIIVLKFYLHLSKEEQRQ 193 
AjEAEQVEgggKgjajTFEKMLVDSGITLIKLWFDVSDEEQEJ 



AAN87337 

A8DVD6 

Q015Y3 



Module 

RFISRRDDPLKHWKLSPIDIKSLDKWDDYTAAKQAMFFHTDTADAPWTVIK 2H 
RLLRRLETEKHNWKFSPGDLKERELWDDYQKCYEEAINKTSKEHAPWYIVP 2^ 
RFKDRLERSEKRWKLS PMDLFARSKFYEYSKARDLMFENTS -ETVPWS IIP 247 



AAN87337 SDDKKRARLNCIRHFLHSLD-YPDK-DRRIAHEPDPLLVGPASRVIEEDEK 339 

A8DVD6 ADNKETCRYLVAKTILEgLQKYTDIKEPEL EOEV KDNIK 28l3 

Q015Y3 ADDKRVARLNAIQHILG-- 264 



cons 



IJ 



AAN87337 f?AEAAAAPGHANLDIPA 357 

A8DVD6 MYHE KLASEQ 293 

Q015Y3 X 265 



Figure 1 TCoffee analysis of bacteria type PPK2 enzymes. Pseudomonas aeruginosa accession (GenBank AAN87337) compared to Interpro: 
A8DVD6 (Nematostelia vectensis) and Interpro: Q015Y3 {Ostreococcus foun).Square symbols ■ indicate putative P loop (ATP/GTP binding site motif 
[10])/Walker A motif with Walker B and lid modules indicated [1 1]. Other residues (*) perfectly conserved (cons), (:) very similar, (.) similar Warmer 
colours (red and orange) show best - matching regions through to cooler colours (green, blue) with poorer alignment. 



Host/parasite relationships provide obvious opportun- 
ities for gene exchange [17]. However, in a genome 
project there is the potential for mistaking a horizontally 
transferred gene for a bacterial DNA contaminant 



acquired during gene banl< construction [29]. The 
Wolbachia insect symbiont has integrated 30% of its 
genome into the Callosobruchus beetle genome; most of 
these genes are disrupted and transcriptionally inactive 
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0.97 
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0.91 
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- Dictyostelium_fasciculatum 
■ Polysphondylium_palliduni 
Dictyostelium_discoideum 
Dictyostelium_purpureum 

Thecamonas trahens 

— Cyanidioschyzon_merolae 



0.7 i 



Porphyrayezoensis 

Ostreococcustauri 

Ostreococcus lucimarinus 



■C, 



0.2i 



0.44 



Physcomitrella_patensa 
■ Physcomitrella_patensb 
■ CyanothecespATCCSl 142 

Maricaulis_maris 

- Agrobacteriumvitis 
■ Kineococcus radiotolerans 



— Myxococcus_xanthus 
■ Leptospiraborg_petersenii 



O.lf 



0.62 



■ Bacillus halodurans 
Leuconostoc citreum 



■ Desulfitobacteriumhafniense 
Metlianospirillum_hungatei 
Gramella forsetii 



0.96 



0.24 



■ Psychromonas ingrahamii 
Escherichia coli 



■ Photobacterium_profundum 



0.88 



0.51 



0.94 



■ Nitratiruptorsp. 

— Neisseriagonorrhoeae 

Ralstonia_pickettii 

Bordetella_petrii 

- Dechloromonas aromatica 



0.62 



0.95 



■ Xanthomonas_axonopodis 
Pseudomonasaeruginosa 



■ Methylacidiphiluminfemorum 



0.3 

Figure 2 Phylogenetic analysis of PPK1. Numbers on the branches indicate bootstrapping values out of 100 calculated for maximum 
lil<elihood (expressed as a fraction of 1). Those branches showing less than 10% have been collapsed. Colours of eukaryotes indicate major 
taxonomic groupings (blue- non-photosynthetic eukaryotes, green- plants and green algae, red- red algae). 



[30]. Klasson ef al. 2009 [31] demonstrated the expres- 
sion of Wolbachia genes in Aedes mosquito. These 
observations may be consistent with early Blastn reports 
of ppkl matches at the DNA level to some invertebrates 
[1,26]. However, bacterial DNA may be horizontally trans- 
ferred but not active as a PPKl product. The twenty 
Wolbachia genomes available on NCBI were searched 
via BLASTp using the E. coli PPKl (NP_416996 ) and 
P. aeruginosa PPK2 (NP_248831) protein sequences. 
No significant PPKl (lowest e - value 2.3, just 29% 
identity over 34 amino acids) or PPK2 (two hits at e - 
value < 0.05, best being e = 0.042 with just 27% identity 
over 82 amino acids) matches were found. The source 



of potential PPK encoding sequences, whether active 
or not, in invertebrates remains a puzzle. Claims for 
the widespread occurrence of bacteria type PPKs in 
eukaryotes [1] are overstated and these enzymes show 
a much more restricted distribution. 

Figure 2 demonstrates quite distinct clusters of PPKl 
enzymes in eukaryotes. These clusters are consistent with 
the taxonomic groupings of these eukaryotes. P. patens, 
O. tauri and O. lucimarinus form one group (green 
plants and green algae), the three Dictyostelium spe- 
cies, P. pallidum and T. trahens (non-photosynthetic eu- 
karyotes) a second group, with P. yezoensis and C merolae 
(red algae) forming a third group. All of these groups are 



Whitehead ef al. BMC Research Notes 2013, 6:221 
http://www.biomedcentral.eom/1756-0500/6/221 



Page 7 of 1 2 



■ Bacteroidesfragilis 



■ Nematostella_vectensis 
— Lactobacillus rhamnosus 



0.61 



Methano_sarcinabarkeri 

Leptospirabiflexa 

- Bacillus_weihenstephanensis 
■ Heliobacterium_modesticaldum 

- Ostreococcus tauri 

0 79 I Desulfotalea_psychrophila 

Sinorhizobium meliloti 




Xanthomonasoryzae 
Wolmella_succinogenes 
Micrococcusluteus 
Thauerasp 

Burkholderia_pliymatum 
Bordetellaavium 
Paracoccus_denitrificans 

— Candidatusvesicomyosociusokutanii 



- Psuedomonasaeruginosa 
■ Laribacter_hongkongensis 
■ Vibrioharveyi 
■ Magnetococcussp 
■ Acaryochloris_marina 

0 93 I Rhodopirellula baltica 

■ Serratia_proteamaculans 



0.5 

Figure 3 Phylogenetic analysis of PPK2. Numbers on the branches indicate bootstrapping values out of 100 calculated for maximum 

likelihood (expressed as a fraction of 1). Those branches showing less than 10% have been collapsed. Colours of eukaryotes indicate major 

taxonomic groupings (blue-metazoa, green- plants and green algae). 
^ J 



well supported by bootstrapping values. The most obvious 
donor for horizontally transferred PPKls in eukaryotes is 
an ancestor in common with the cyanobacteria as shown 
by the association of the Cyanothece sp. PPKl with the 
eukaryotic grouping. The cyanobacteria formed the ori- 
ginal endosymbionts generating chloroplasts. The two 

Table 4 Summary of £. coli PPK1 and P. aeruginosa PPK2 
tBIastn searches against NCBI Arthropoda database (28 
species) displaying hits with e-values < 0.05 

Species Accession E value 

PPKl 



Aedes aegypti strain Liverpool cont 1.27756 

Aedes aegypti strain Liverpool cont 1.32234 

Acyrthosiphon pisum strain LSRl cont 2200 

Daphnia pulex DAPPU scaffold_l 3832_cont 
18676 

PPK2 

Aedes aegypti strain Liverpool cont 1.29024 

Aedes aegypti strain Liverpool cont 1.30944 

Ixodes scapularis strain Wikel colony g 
cont_1 1 08378508994 

Ixodes scapularis strain Wikel colony g 
cont_1 108379235394 



AAGE02027756. 
AAGE02032234. 
ABLF02002165.' 
ACJG01018676 



1 2e-' 
1 1 e-^ 
3e-^ 
1 1 e-^ 



AAGE02029024.1 3 e""" 

AAGE02030944.1 3 e"^^ 

ABJBOl 0687643.1 5 e"^^ 

ABJBOl 0847895.1 0.083 



Ostreococcus species are generally considered to be very 
divergent, with an average of only 70% amino acid identity 
between orthologous proteins, making O. tauri and O. 
lucimarinus amongst the most dissimilar members of the 
same genus in any eukaryote [32] . The most parsimonious 
explanation may be the acquisition of PPKl when 
Ostreococcus and Physcomitrella last shared a common 
ancestor, with subsequent losses in other lines. Attempting 
to trace a common or single origin of a specific ppk gene 
may be unrealistic, particularly in light of the complex 
evolution of algae with the potential for secondary hori- 
zontal gene transfer events [33]. 

There are only two convincing eukaryote PPK2s found 
on the Interpro database (Figure 1). Phylogenetic ana- 
lysis suggests that these two have quite different origins 
(Figure 3). Interestingly, O. lucimarinus appears not to 
have a ppk2 - presumably the O. tauri example was 
gained after the species separated or a ppk2 sequence 
inherited from a common ancestor was subsequently 
lost by one species. Derelle et al. 2008 [34] describe one 
possible candidate virus, OtV5, as an agent of horizontal 
transfer in this genus. Raymond and Blankenship (2003) 
[35] emphasise the importance of HGT in evolution of 
eukaryotic algae with endosymbiosis extending beyond 
the original event of engulfment of cyanobacteria to 
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create plastids to include acquisition of genes from other 
algae at other times. Rohwer and Thurber (2009) [36] 
give further examples of HGT into metazoans within the 
marine environment including viral vectors moving 
genes between animals and plants. 

It is also important to highlight those groups of organ- 
isms which are notable by their absence from the short 
list of eukaryote PPKs. Horizontally transferred genes 
have been shown to affect the metabolism of numerous 
fungal species [37]. Yet no examples of PPKl or PPK2 
were observed in the annotated or partially completed 
116 genomes of fungi investigated here. However, since 
fungi do not engulf organisms via phagotrophy, it may 
suggest an additional clue to the origin of the horizon- 
tally transferred PPKs in other eukaryotes. Similarly, 
most simple eukaryotes and almost all photosynthetic 
organisms examined did not contain these bacteria type 
enzymes. 

Horizontally transferred genes from eubacteria would be 
expected at least initially to contain no introns. Assuming 
that the bacteria type PPKl and PPK2 enzymes have been 
acquired from bacteria, how and why do horizontally 
transferred genes acquire introns? Table 2 reveals that 
some eukaryote ppkl and ppk2 examples are indeed intron 
free. Spliceosomal introns are typically found in nuclear ge- 
nomes, and their presence indicates a major role in evolu- 
tion, although no overall general function is known. It is 
known that organisms with shorter life cycles tend to have 
less intronization perhaps reflecting selection for reduced 
processing time for mRNAs. Intron length has been posi- 
tively correlated to gene expression in unicellular eukaryotes 
but negatively correlated in multicellular eukaryotes whilst 
mildly deleterious elements may accumulate in more com- 
plex organisms that have small populations [38,39]. In the 
examples shovm in the present work there is a range from 
ppkl genes with no introns (such as in Ostreococcus and D. 
discoideum) to the 21 and 22 introns found in P. patens 
(Table 2). Neither example of eul<aryote ppk2 has introns 
(O. tauri, N. vectensis). Rotifers, which also predate bacteria, 
have acquired many bacterial genes and whilst some are de- 
fective, others are expressed and these may include genes 
with introns [40]. This compares with relatively littie evi- 
dence of intron gain in Entamoeba horizontally transferred 
genes [41] or in the recent evolutionary past of higher plants 
[42]. There is no single source of data expressing intron 
density for each of these eukaryote species in a common for- 
mat, with some authors and databases quoting introns per 
gene, introns per transcript, introns per kb of coding 
sequence or introns per spliced gene. The C. merolae and D. 
discoideum genomes have means of just 0.005 and 1.31 in- 
trons per gene respectively [39] so absence or a low number 
of introns in their ppk genes is not remarkable. A possible 
mechanism of intron gain could be increased transposon 
activity that activated double stranded DNA repair [43]. 



In P. patens an accepted horizontally transferred gly- 
cerol/water channel gene has 5 introns [44]. There are 
four recognised domains for PPKl [9] so multiple in- 
trons are unlikely to have evolved as a means to promote 
domain shuffling. Stenoien (2007) [45] suggests that 
highly expressed genes in P. patens have shorter introns 
than genes with low expression levels and so acquisition 
of small introns may reflect mechanisms to regulate 
gene expression (Table 3). It has been suggested that 
there has been an ancestral duplication of the P. patens 
genome [27,46] with essential genes such as radSl (re- 
combination repair) being present in two copies, thus 
allowing pseudoallelism and the protection of an essen- 
tial function in the gametophyte (haploid) plant. Hence, 
the presence of two copies of a ppkl gene in this species 
suggests an essential function. Table 3 implies that the 
two P. patens ppkl genes have an unusually high number 
of introns. However, Csuros et al. 2011 [43] suggest that 
this species has a mean of 5.5 introns per kb of coding 
sequence. On this basis, the apparently large number of 
introns may, at least partially, reflect the unusually large 
size of these two genes for the species. Sucgang et al. 2011 
[47] describe mean intron values of 1.9 and 1.5 per spliced 
gene for D. discoideum and D. purpureum respectively. 
D. discoideum and D. purpureum diverged approxi- 
mately 400 million years ago [47] so the acquisition of 
PPKl in these and the other slime moulds presumably 
predates this speciation event. The colonisation of the 
land by plants around 470 million years ago was 
followed by the divergence of the line leading to 
Physcomitrella from that leading to Selaginella and 
higher plants about 430 million years ago [48]. Hence, 
it is reasonable to suggest that one horizontal transfer 
event may have occurred in a common ancestor of 
these species and the genes have been maintained by 
common selective pressures. In individual species such 
as Selaginella the ppk genes have subsequently been 
lost. Although this is the most parsimonious explan- 
ation, the occurrence of multiple acquisitions and 
losses should not be ruled out. 

What are the advantages to a eukaryote in maintaining 
a bacteria-type PPK? As poly P is found in all cells there 
must be alternative mechanisms for manufacture in eu- 
karyotes, perhaps based on the actin related PPK3 sys- 
tem [18]. Horizontally acquired PPKl or PPK2 must 
replace or supplement such native enzymes. Indeed, the 
primitive red alga C. merolae has a VTClp homologue 
in a poly P containing vacuole. Poly P is a key store of 
phosphate in this acid and heat tolerant species rather 
than phytic acid commonly found in higher plants [49]. 
Similarly poly P is the stored form of phosphate in green 
algae [50], hence acquiring PPKl or PPK2 may provide a 
greater flexibility in nutrient stress responses in algal 
cells. O. tauri is a unicellular green alga that is an 
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important member of the global phytoplankton and has 
cell dimensions of around 1 i^m diameter, equivalent to 
a prokaryotic cell. In O. tauri nitrogen starvation results 
in an increase in PPK activity [51]. So poly P accumula- 
tion via enhanced PPK activity is clearly a valuable re- 
sponse to nutrient depletion stress, possibly to maintain 
phosphate reserves. Such activity may withdraw more 
reactive and soluble phosphate molecules from metabol- 
ism or potential efflux from a tiny cell, like O. taurioi C. 
merolae, with a high ratio of surface area to volume. 

Simple multicellular eukaryotes may face periods of 
water immersion followed by desiccation. For example, 
mosses such as P. patens and slime moulds both have a 
need to escape aqueous environments to sporulate [52]. 
Desiccation tolerance in individual cells and tissues is re- 
quired in plants such as P. yezoemis and P. patens, 
which lack the efficient vascular systems of higher plants 
[27]. Under such circumstances there may be selection 
pressure to use horizontally transferred genes available 
in the surrounding bacterial population. Higher plants 
have more sophisticated water transport, compatible sol- 
ute manufacture and waxy cuticles so that individual 
cells are no longer desiccated. Similarly, filamentous 
fungi have thick chitinous walls and the VTC PPK sys- 
tem [14], so may be more desiccation tolerant anyway. 
Hence slime moulds and mosses may represent special 
cases of incomplete adaptations to dry land colonisation. 

Eukaryote PPK Is are characterised by being of higher 
mass than prokaryote homologs [18] - N terminal 
extensions perhaps reflect differences in targeting the 
enzyme, e.g. to a membrane or subcellular compartment, 
or in assuming quaternary structures. However, Target 
P and Signal P analysis [53] failed to show any signal 
peptides or common chloroplast or mitochondrial 
targeting sequences amongst the eukaryote PPKls or 
PPK2s. Zhao et al. (2008) [54] demonstrated that poly 
P influences intron splicing protein localisation and 
concentration of poly P subapically in E. coli and plays 
a crucial role in establishing cell polarity in cytokinesis. 
Subcellular localisation of PPK activity in eukaryotes is 
then also presumably important. Inorganic phosphate 
and smaller molecules such as ATP and GTP are 
highly reactive and PPK provides a mechanism for 
storage of phosphate in the less reactive high molecu- 
lar weight poly P. For some eukaryotes, such as those 
adapting to novel stressful environments without the 
benefit of a developed vascular system, the acquisition 
of a bacteria type PPK and the evolution of additional 
ppk genes under the control of new promoters may 
provide additional opportunities for the control of poly 
P synthesis within the cell. A ppkl deletion mutant of 
the social slime mould D. discoideum had reduced 
levels of poly P and was deficient in development, 
sporulation and predation [55]. 



Lenton et al. (2012) [56], using P. patens as an experi- 
mental organism, have recently described the exciting 
hypothesis that non-vascular plants colonising rock sur- 
faces accelerated chemical weathering, releasing phos- 
phate for enhanced growth of oceanic phytoplankton, to 
the extent that falls in atmospheric carbon dioxide pre- 
cipitated the global growth of ice sheets. Central to this 
concept is the release of phosphate to the ocean to fuel 
oceanic carbon fixation. Interestingly, two other eukary- 
otes that have bacteria type PPKl and PPK2 are the 
abundant picophytoplankton Ostreococcus tauri and O. 
lucimarinus (Table 1 [18]). In this respect, horizontal 
transfer of bacterial ppk genes to early plants such as 
P. patens colonising the land and to marine phyto- 
plankton exploiting the consequent increase in oceanic 
phosphate, would have been a key factor in the slow 
decline in atmospheric carbon dioxide in the Ordovician. 

Conclusions 

Convincing database examples of eukaryote PPKs de- 
rived from bacteria type PPKl and PPK2 enzymes are 
rare and currently confined to a few simple eukaryotes. 
These enzymes likely represent horizontal transfer 
events occurring before the time of the colonisation of 
land by plants, with the possibility of subsequent mul- 
tiple losses and further gains in different lineages. It is 
proposed that the retention of such horizontally trans- 
ferred sequences is an advantage for stress tolerance in 
eukaryotes without sophisticated multicellular adapta- 
tions to stresses such as desiccation or nutrient deple- 
tion. The enhanced acquisition, release and storage of 
phosphates facilitated by bacterial PPKs may have pro- 
moted the colonisation of land by early plants and 
fuelled the growth of oceanic phytoplankton. There is 
very limited evidence for DNA sequences encoding 
PPKs in a wider range of eukaryotes, notably some in- 
vertebrates, though it is less clear that these represent 
functional genes. 

Methods 

Identification of PPKl and PPK2 sequences in eul<aryotes 

The Interpro database (http://www.ebi.ac.uk/interpro/) 
was searched using keywords "polyphosphate kinase" 
and individual eukaryotic accessions collated. Blastp 
searches [19] at NCBI with the genome database (http:// 
www.ncbi.nlm.nih.gov/sutils/genom_table.cgi) were used 
for direct access to individual sequences of bacterial 
PPKl/2 representatives by using E. coli PPKl (UniProt 
P0A7B1) and P. aeruginosa PPK2 (Genbank NP_248831) 
as in silico probes. A default e-value of < 10 was used as 
the cut off to then determine manual scrutiny of any hits 
for the presence of conserved residues when screening 
these multiple species databases. The top matching bac- 
terial sequence from each major bacterial grouping was 
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chosen as the representative of that group. Some indi- 
vidual genomes at JGI (http://www.jgi.doe.gov/) were 
accessed, with a default BLASTp e-value of < 1 used to de- 
termine manual assessment of a match. D. fasciculatum 
data was accessed via the Social Amoebas Comparative 
Genome Browser (http://sacgb.fli-leibniz.de/cgi/index.pl? 
ssi=free). The C. merolae genome was accessed at http:// 
merolae.biol.s.u-tokyo.ac.jp/. Phytozome v.7 (http://www. 
phytozome.net/) was used to screen a collection of plant ge- 
nomes by Blastp for visual assessment at e < 1. The BOG AS 
site (http://bioinformatics.psb.ugent.be/genomes/) was used 
to access the Ectocarpus silicosus genome. The origins of 
multicellularity database (http://www.broadinstitute.org/ 
annotation/genome/multicellularity_project/MultiHome. 
html) was used to screen some simple eukaryotes. For 
arthropod sequences primary amino acid sequences for 
E. coli PPKl (Genbank NP_416996 ) and P. aeruginosa 
PPK2 (Genbank NP_248831) were used to search via 
Blastp, predicted proteins of the 28 completed arthro- 
pod genomes available at NCBI. The same protein se- 
quences were then used to search genomic DNA using 
tBlastn to detect potential PPK encoding sequences. 
Any matches to accessions at lower than e-value = 0.05 
were then scrutinised by EST/transcript searches 
at species specific databases: for Aedes aegypti 
(SRA transcripts at http://blast.ncbi.nlm.nih.gov/), 
Acyrthosiphon pisum (http://tools.genouest.org/tools/ 
aphidblast/), Daphnia pulex (http://genome.jgi-psf org/ 
Dappul/Dappul. home. html) and Ixodes scapularis 
(http://iscapularis.vectorbase.org/). 

All eukaryotic sequences identified were checked for 
annotation within expression data and the nucleic acid 
sequence searched against all bacterial databases. To 
avoid sequence contamination all eukaryotic sequences 
were excluded if the level of nucleic acid identity was 
identical or virtually identical. 

Sequence alignment and phylogenetic analysis 

Identity within different PPKl and PPK2 proteins was 
observed throughout the sequences, hence entire pro- 
teins were used for the analysis. Where eukaryotic pro- 
teins had N terminal extensions, these were removed for 
some analyses but demonstrated very similar results as 
using the entire proteins. ClustalW (http://www.ebi.ac. 
uk/) and TCoffee [20], (http://igs-server.cnrs-mrs.fr/ 
Tcoffee/tcoffee cgi/index.cgi) were used to construct 
multiple alignments and were manually refined. Domain 
identification used CDD (http://www.ncbi.nlm.nih.gov/ 
cdd) and SMART (http://smart.embl-heidelberg.de/smart/ 
set_mode.cgi?NORMAL=l). For phylogenetic analysis 
both ClustalX [21] and MUSCLE [22] were used for align- 
ment. Maximum lil<elihood (PhyMLS.O, using the WAG 
model of sequence evolution) and Neighbour joining 
(Phylip, using the PAM 350 model of sequence evolution) 



analysis was viewed with Treedyn [23] or Treeview 
[24] and produced similar results. The presented ana- 
lysis is based upon MUSCLE and Treedyn with boot- 
strapping values out of 100 presented (expressed as a 
fraction of 1). Percentage GC was calculated using 
http://www.genomicsplace.com/gc_calc.html. Additional file 
4 provides the species used for the phylogenetic analysis. 

Availability of supporting data 

Figures 2 and 3 are deposited at treebase (http://purl. 
org/phylo/treebase/phylows/study/TB2:S13979). 

Additional files 



Additional file 1: TCoffee multiple alignment of PPK1 from 
eukaryotes. Advanced TCoffee alignment of eukaryotic PPKls compared 
with prokaryotic (Uniprot: P0A7B1 Escherichia coli) and archaeal (interpro: 
A2SQZ9 Methanocorpuscuium labreanum) enzymes, interpro; Q54BM7 
Dictyostelium discoideum, Sociai amoebas comparative genome browser 
(SACGB): C300023 (DPU_G0062710) D. purpureum, interpro: A2VBB5 
Porphyra yezoensis, interpro: Q2MEV6 Physcomitrella patens, interpro: 
A9U2N0 P. patens, interpro: Q01iH21 Ostreococcus tauri, Interpro: A4RQil 
O.lucinnarinus, interpro: D3B5iH9 Polysphondylium pallidum, interpro: 
E9CFK0 Capsaspora owczarzaki, SACGB: EGG21 828.1 D. fasciculatum, C 
merolae genome database: CIV1IV1026C Cyanidloschyzon merolae, interpro: 
F4PF87 Batrachochytrium dendrobatidis. Origins of muiticeiluiarity 
databaseAMSGl 1662 Thecamonas trahens. Highiy conserved active site 
residues (f) [18]. 

Additional file 2: Phylogenetic analysis of PPK1 including closest 
matching cyanobacteria. Numbers on the branches indicate 
bootstrapping values out of 100 calculated for maximum likelihood. 
Colours of eukaryotes indicate major taxonomic groupings (blue- non- 
photosynthetic eukaryotes, green- plants and green algae, red- red algae, 
purple-cyanobacteria). 

Additional file 3: Phylogenetic analysis of PPK1 including weakest 
matching cyanobacteria. Numbers on the branches indicate 
bootstrapping values out of 100 calculated for maximum likelihood. 
Colours of eukaryotes indicate major taxonomic groupings (blue- non- 
photosynthetic eukaryotes, green- plants and green algae, red- red algae, 
purple-cyanobacteria). 

Additional file 4: Representative species taken from bacterial taxa 
for phylogenetic analysis. 
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